智能论文笔记

A Multi-Modal Wildfire Prediction and Personalized Early-Warning System Based on a Novel Machine Learning Framework

Rohan Tan Bhowmik

分类：机器学习 | 人工智能 | 计算机视觉

2022-08-18

野火越来越多地影响环境，人类健康和安全。在加利福尼亚前20名野火中，2020 - 2021年的野火比上世纪的燃烧更大。加利福尼亚的2018年野火季节造成了1485亿美元的损失。在数百万受影响的人中，由于不足的警报手段，残疾人（约占世界人口的15％）受到不成比例的影响。在该项目中，基于先进的机器学习体系结构开发了多模式野火预测和个性化预警系统。从2012年到2018年的环境保护局和历史野火数据的传感器数据已编译，以建立一个全面的野火数据库，即同类最大的数据库。接下来，设计了一种新型的U-Convolutional-LSTM（长短期记忆）神经网络，设计了一种特殊的体系结构，可从连续的环境参数中提取关键的空间和时间特征，以指示即将来临的野火。环境和气象因素被纳入数据库，并分类为主要指标和落后指标，分别与野火构想和传播的风险相关。此外，地质数据还用于提供更好的野火风险评估。这种新颖的时空神经网络使用传统的卷积神经网络实现了> 97％的精度，而左右的卷积神经网络则达到了约76％，成功地预测了2018年2018年最具破坏性的野火，提前5-14天提前5-14天。最后，提出了一种个性化的预警系统，该警告系统针对有感觉障碍或呼吸系统加剧条件的人量身定制。该技术将使消防部门在袭击之前预测和防止野火，并为处于危险中的个人提供早期警告以更好地准备，从而挽救生命并减少经济损失。

translated by 谷歌翻译

AP: Selective Activation for De-sparsifying Pruned Neural Networks

Shiyu Liu , Rohan Ghosh , Dylan Tan , Mehul Motani

分类：机器学习 | 人工智能 | 计算机视觉

2022-12-09

The rectified linear unit (ReLU) is a highly successful activation function in neural networks as it allows networks to easily obtain sparse representations, which reduces overfitting in overparameterized networks. However, in network pruning, we find that the sparsity introduced by ReLU, which we quantify by a term called dynamic dead neuron rate (DNR), is not beneficial for the pruned network. Interestingly, the more the network is pruned, the smaller the dynamic DNR becomes during optimization. This motivates us to propose a method to explicitly reduce the dynamic DNR for the pruned network, i.e., de-sparsify the network. We refer to our method as Activating-while-Pruning (AP). We note that AP does not function as a stand-alone method, as it does not evaluate the importance of weights. Instead, it works in tandem with existing pruning methods and aims to improve their performance by selective activation of nodes to reduce the dynamic DNR. We conduct extensive experiments using popular networks (e.g., ResNet, VGG) via two classical and three state-of-the-art pruning methods. The experimental results on public datasets (e.g., CIFAR-10/100) suggest that AP works well with existing pruning methods and improves the performance by 3% - 4%. For larger scale datasets (e.g., ImageNet) and state-of-the-art networks (e.g., vision transformer), we observe an improvement of 2% - 3% with AP as opposed to without. Lastly, we conduct an ablation study to examine the effectiveness of the components comprising AP.

translated by 谷歌翻译

Optimizing Learning Rate Schedules for Iterative Pruning of Deep Neural Networks

Shiyu Liu , Rohan Ghosh , John Tan Chong Min , Mehul Motani

分类：机器学习 | 人工智能

2022-12-09

The importance of learning rate (LR) schedules on network pruning has been observed in a few recent works. As an example, Frankle and Carbin (2019) highlighted that winning tickets (i.e., accuracy preserving subnetworks) can not be found without applying a LR warmup schedule and Renda, Frankle and Carbin (2020) demonstrated that rewinding the LR to its initial state at the end of each pruning cycle improves performance. In this paper, we go one step further by first providing a theoretical justification for the surprising effect of LR schedules. Next, we propose a LR schedule for network pruning called SILO, which stands for S-shaped Improved Learning rate Optimization. The advantages of SILO over existing state-of-the-art (SOTA) LR schedules are two-fold: (i) SILO has a strong theoretical motivation and dynamically adjusts the LR during pruning to improve generalization. Specifically, SILO increases the LR upper bound (max_lr) in an S-shape. This leads to an improvement of 2% - 4% in extensive experiments with various types of networks (e.g., Vision Transformers, ResNet) on popular datasets such as ImageNet, CIFAR-10/100. (ii) In addition to the strong theoretical motivation, SILO is empirically optimal in the sense of matching an Oracle, which exhaustively searches for the optimal value of max_lr via grid search. We find that SILO is able to precisely adjust the value of max_lr to be within the Oracle optimized interval, resulting in performance competitive with the Oracle with significantly lower complexity.

translated by 谷歌翻译

1st Workshop on Maritime Computer Vision (MaCVi) 2023: Challenge Results

Benjamin Kiefer , Matej Kristan , Janez Perš , Lojze Žust , Fabio Poiesi , Fabio Augusto de Alcantara Andrade , Alexandre Bernardino , Matthew Dawkins , Jenni Raitoharju , Yitong Quan

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2022-11-24

The 1$^{\text{st}}$ Workshop on Maritime Computer Vision (MaCVi) 2023 focused on maritime computer vision for Unmanned Aerial Vehicles (UAV) and Unmanned Surface Vehicle (USV), and organized several subchallenges in this domain: (i) UAV-based Maritime Object Detection, (ii) UAV-based Maritime Object Tracking, (iii) USV-based Maritime Obstacle Segmentation and (iv) USV-based Maritime Obstacle Detection. The subchallenges were based on the SeaDronesSee and MODS benchmarks. This report summarizes the main findings of the individual subchallenges and introduces a new benchmark, called SeaDronesSee Object Detection v2, which extends the previous benchmark by including more classes and footage. We provide statistical and qualitative analyses, and assess trends in the best-performing methodologies of over 130 submissions. The methods are summarized in the appendix. The datasets, evaluation code and the leaderboard are publicly available at https://seadronessee.cs.uni-tuebingen.de/macvi.

translated by 谷歌翻译

Distributed Ranging SLAM for Multiple Robots with Ultra-WideBand and Odometry Measurements

Ran Liu , Zhongyuan Deng , Zhiqiang Cao , Muhammad Shalihan , Billy Pik Lik Lau , Kaixiang Chen , Kaushik Bhowmik , Chau Yuen , U-Xuan Tan

分类：机器人

2022-07-08

为了在多个机器人系统中有效完成任务，必须解决的问题是同时定位和映射（SLAM）。激光雷达（光检测和范围）由于其出色的精度而用于许多SLAM解决方案，但其性能在无特征环境（如隧道或长走廊）中降低。集中式大满贯解决了云服务器的问题，云服务器需要大量的计算资源，并且缺乏针对中央节点故障的鲁棒性。为了解决这些问题，我们提出了一个分布式的SLAM解决方案，以使用超宽带（UWB）范围和探测测量值估算一组机器人的轨迹。所提出的方法在机器人团队之间分配了处理，并显着减轻了从集中式大满贯出现的计算问题。我们的解决方案通过最大程度地减少在机器人处于近距离接近时在不同位置进行的UWB范围测量方法来确定两个机器人之间的相对姿势（也称为环闭合）。 UWB在视线条件下提供了良好的距离度量，但是由于机器人的噪声和不可预测的路径，检索精确的姿势估计仍然是一个挑战。为了处理可疑的循环封闭，我们使用成对的一致性最大化（PCM）来检查循环封闭质量并执行异常拒绝。然后，在分布式姿势图优化（DPGO）模块中将过滤的环闭合与探光仪融合，以恢复机器人团队的完整轨迹。进行了广泛的实验以验证所提出的方法的有效性。

translated by 谷歌翻译

Finding the Most Transferable Tasks for Brain Image Segmentation

Yicong Li , Yang Tan , Jingyun Yang , Yang Li , Xiao-Ping Zhang

分类：人工智能 | 计算机视觉 | 机器学习

2023-01-03

Although many studies have successfully applied transfer learning to medical image segmentation, very few of them have investigated the selection strategy when multiple source tasks are available for transfer. In this paper, we propose a prior knowledge guided and transferability based framework to select the best source tasks among a collection of brain image segmentation tasks, to improve the transfer learning performance on the given target task. The framework consists of modality analysis, RoI (region of interest) analysis, and transferability estimation, such that the source task selection can be refined step by step. Specifically, we adapt the state-of-the-art analytical transferability estimation metrics to medical image segmentation tasks and further show that their performance can be significantly boosted by filtering candidate source tasks based on modality and RoI characteristics. Our experiments on brain matter, brain tumor, and white matter hyperintensities segmentation datasets reveal that transferring from different tasks under the same modality is often more successful than transferring from the same task under different modalities. Furthermore, within the same modality, transferring from the source task that has stronger RoI shape similarity with the target task can significantly improve the final transfer performance. And such similarity can be captured using the Structural Similarity index in the label space.

translated by 谷歌翻译

Tsetlin Machine Embedding: Representing Words Using Logical Expressions

Bimal Bhattarai , Ole-Christoffer Granmo , Lei Jiao , Rohan Yadav , Jivitesh Sharma

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-02

Embedding words in vector space is a fundamental first step in state-of-the-art natural language processing (NLP). Typical NLP solutions employ pre-defined vector representations to improve generalization by co-locating similar words in vector space. For instance, Word2Vec is a self-supervised predictive model that captures the context of words using a neural network. Similarly, GLoVe is a popular unsupervised model incorporating corpus-wide word co-occurrence statistics. Such word embedding has significantly boosted important NLP tasks, including sentiment analysis, document classification, and machine translation. However, the embeddings are dense floating-point vectors, making them expensive to compute and difficult to interpret. In this paper, we instead propose to represent the semantics of words with a few defining words that are related using propositional logic. To produce such logical embeddings, we introduce a Tsetlin Machine-based autoencoder that learns logical clauses self-supervised. The clauses consist of contextual words like "black," "cup," and "hot" to define other words like "coffee," thus being human-understandable. We evaluate our embedding approach on several intrinsic and extrinsic benchmarks, outperforming GLoVe on six classification tasks. Furthermore, we investigate the interpretability of our embedding using the logical representations acquired during training. We also visualize word clusters in vector space, demonstrating how our logical embedding co-locate similar words.

translated by 谷歌翻译

Neural Collapse in Deep Linear Network: From Balanced to Imbalanced Data

Hien Dang , Tan Nguyen , Tho Tran , Hung Tran , Nhat Ho

分类：机器学习 | (统计)机器学习

2023-01-01

Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.

translated by 谷歌翻译

PAC-Bayesian-Like Error Bound for a Class of Linear Time-Invariant Stochastic State-Space Models

Deividas Eringis , John Leth , Zheng-Hua Tan , Rafal Wisniewski , Mihaly Petreczky

分类： (统计)机器学习 | 机器学习

2022-12-30

In this paper we derive a PAC-Bayesian-Like error bound for a class of stochastic dynamical systems with inputs, namely, for linear time-invariant stochastic state-space models (stochastic LTI systems for short). This class of systems is widely used in control engineering and econometrics, in particular, they represent a special case of recurrent neural networks. In this paper we 1) formalize the learning problem for stochastic LTI systems with inputs, 2) derive a PAC-Bayesian-Like error bound for such systems, 3) discuss various consequences of this error bound.

translated by 谷歌翻译

ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech

Zehua Chen , Yihan Wu , Yichong Leng , Jiawei Chen , Haohe Liu , Xu Tan , Yang Cui , Ke Wang , Lei He , Sheng Zhao

分类：自然语言处理 | 机器学习

2022-12-30

Denoising Diffusion Probabilistic Models (DDPMs) are emerging in text-to-speech (TTS) synthesis because of their strong capability of generating high-fidelity samples. However, their iterative refinement process in high-dimensional data space results in slow inference speed, which restricts their application in real-time systems. Previous works have explored speeding up by minimizing the number of inference steps but at the cost of sample quality. In this work, to improve the inference speed for DDPM-based TTS model while achieving high sample quality, we propose ResGrad, a lightweight diffusion model which learns to refine the output spectrogram of an existing TTS model (e.g., FastSpeech 2) by predicting the residual between the model output and the corresponding ground-truth speech. ResGrad has several advantages: 1) Compare with other acceleration methods for DDPM which need to synthesize speech from scratch, ResGrad reduces the complexity of task by changing the generation target from ground-truth mel-spectrogram to the residual, resulting into a more lightweight model and thus a smaller real-time factor. 2) ResGrad is employed in the inference process of the existing TTS model in a plug-and-play way, without re-training this model. We verify ResGrad on the single-speaker dataset LJSpeech and two more challenging datasets with multiple speakers (LibriTTS) and high sampling rate (VCTK). Experimental results show that in comparison with other speed-up methods of DDPMs: 1) ResGrad achieves better sample quality with the same inference speed measured by real-time factor; 2) with similar speech quality, ResGrad synthesizes speech faster than baseline methods by more than 10 times. Audio samples are available at https://resgrad1.github.io/.

translated by 谷歌翻译